An Adventure In Analytics

Lemon
6 min readOct 25, 2016

Ad blocking software is everywhere, including the browser I’m using right now. This is for a number of valid reasons: Website ads can be invasive, ugly, slow and expensive, and can sometimes serve up malware. Because of this, there’s an upward trending line of people who are doing all their web browsing behind an ad blocker. This no doubt improves web browsing for the people using it, but also fundamentally disrupts how monetizing anything on the internet works.

I’m not going to discuss the effects of this on an ad-supported economy, because I don’t build ad supported websites and so don’t have much to say other than Website ads are terrible in many ways, but if you like a website, you should whitelist it.

Instead, I want to talk about website analytics, which became collateral damage as ad block use grew. For the makers of the ad blockers, there is some nobility in their intent. Companies like Facebook (and plenty of others, but Facebook especially) track you throughout the internet, collecting data to build a profile on you. They do it even if you’re not logged in and they don’t care if you opt out. This data is collected, packaged, and sold over and over again, to any interested parties, for any reason. That’s the internet we use, and that’s why things are free.

But third party plugins, such as AdBlock, uBlock Origin, Ghostery and Disconnect interfere with all of that. They intercept and trash those little scripts before they’ve had a chance to run, effectively shielding you from all the downsides while still letting you have the things you want. And in some cases, it’s blocking a lot. Here’s some numbers that just came up when I visited a few randomly chosen websites with both AdBlock and Disconnect running…

17 websites want to know when I’m reading the New York Times.
  • Vice | 3 ads, 21 scripts
  • Wall Street Journal | 10 ads, 20 scripts
  • Forbes | AdBlock actually prevents me from viewing Forbes, which is just fine with me
  • ballp.it | 0 ads, 4 scripts
  • InfoWars | 10 ads, ??? scripts (I don’t know how many, but I know it’s more than 99)

With the exception of Alex Jones’ site, a lot of these scripts are employed for perfectly understandable reasons. For example, one of those dings on ballp.it (a site that I made) is from the site loading a font from Google Fonts. I did this for a perfectly good reason: I like pretty fonts, and I don’t like implementing them. I think Cabin is a nice looking font, and I really do not want to do @font-face on a bunch of .woff files, because that is annoying. So, as I have done many times, I’m using the system that Google invented. The fonts work, load time is really quick, the system is free, and it’s incredibly easy to use. I implemented that on the site, and as a result, I traded a teensy tiny itty bitty bit of your privacy when you visit my site.

Just embed the font! And then do it like, 5 more times! Also sometimes it won’t work! (via CodePen)

There’s a lot of abstract examples of this, but here’s something else from my life: I wanted to put a comments section on a site I made, but I absolutely do not, under any circumstances, want to build the infrastructure for a comment section. So, I implemented Disqus on the site and it works great! Does that mean your data gets warehoused when you use it? Yeah, probably!

And when I’m building a website, the thing I’ll add every single time is Google Analytics. I do this for a very simple reason: When I build a website, I want to know how many people visit my website. I think, at the very least, I’m owed that much. There are other ways to achieve this, but I can install GA with one tiny chunk of code. And after I copy/paste that code, I’m up and running. Not only with number of visits, but also bounce rates, users, session duration, and a whole lot of data I can benefit from, such as “What countries do people come to my website from?”, “What sites linked to my site?” and “What kind of device are they using?”.

Like so many Google offerings, it’s elegant, it’s robust, it’s simple, and it’s free. And that’s why every single candidate in the 2016 primaries used it. Of the top 100,000 sites on the internet, 70% of them are using GA. My own Google accounts right now are reporting on 28 different instances of GA I’ve started and I think there’s a couple more I’m just not paying attention to.

So I get a really nice piece of software and in exchange I hand Google data about the people on my website. This is a trade off that I (and most professional web developers) are willing to make, but it’s not worth anything if the data isn’t any good. And I’m starting to suspect my data is pretty bad.

These plugins I’ve been talking about have decided you aren’t willing to comply with the site’s decisions, and won’t allow the script to run. Perhaps even more crucially, new versions of Opera actually ship with a built in adblocker, marketing that as a major selling point. This means that all these users are effectively invisible to the website owner. Your experience is probably undiminished, but you come and leave as a ghost. Anything I would get as a result of your visit (even if “the thing” is excitement over a traffic spike) is not given. If more users block, my metrics show that I’m losing those users.

Here’s two things that happen as a result:

People tend to install plugins on their desktop browser, and then spend longer periods of time screwing around on different internet sites. On their phones they have either stock or unaugmented browsers, but are probably browsing in shorter periods. Most of your web browsing probably happens on a computer, but the computer isn’t reporting data and your phone is. As such, site owners get a user profile that isn’t accurate. Mobile use is overreported and site design is changed to reflect it; Deep navigation, search and hover effects will defer to hamburger menus, endless scrolling and app links.

Also, you acquiesce demographics to the people who actually do get tracked. Ad blocking is more popular among users who are younger, richer and more tech-savvy, who are then removed from the selection. So when a site owner browsers their data, they’ll see an audience that is older, poorer, and dumber than they actually are, and decisions will be made with these assumptions; Do you click on pop ups? Stupid people click on pop ups, and they’re being tracked.

So, I’m going to have an experiment.

After a quick Twitter conversation with Marc Grabanski, I realized that the open source platform Piwik might make a big difference in the short term. It’s similar to Google Analytics, but it’s self hosted and doesn’t report to a network. Therefore it (should) bypass the blocker plugins and (should) report data more accurately, which (should) mean bigger numbers across the board.

As it happens, I do a somewhat popular podcast, which means I also have a website with a pretty good layer of data, and when that data changes, I’d be able to explain it. Setting up Piwik on The F Plus site was definitely more complicated to start than GA was, and it doesn’t serve as a solution for a site like damn.dog (another site I made, which doesn’t have the required PHP framework) but for this particular website, it’s going to be a pretty useful data sample.

As of this morning, I’m running Piwik and Google Analytics concurrently on the site, and I’m going to have both of those running for the next month. In that time, I’m going to check the numbers as they appear in GA, and compare them against the numbers I’m seeing in Piwik. It’s gonna be a struggle as learning anything new is (plus I don’t think it looks nearly as nice) but I’m gonna keep up on it to see what it can teach me.

Over the course of the month, I’m hoping I’ll learn something from the numbers and find out if that’ll change how I look at website traffic. And if it works better, I might start dropping GA completely.

--

--

Lemon

I do things to the internet. It does things to me as well. ahoylemon.xyz